28 - Deep Learning - Plain Version 2020 [ID:21162]
50 von 138 angezeigt

Welcome back to deep learning and today we want to discuss a little more architectures

and in particular the really deep ones.

So here we are really going towards deep learning.

If you want to train deeper models with all the things that we've seen so far, you see

that we go into a certain kind of saturation.

If you want to go deeper, then you just add layers on top and you would hope that the

training error would go down.

But if you look very carefully, you can see that the 20 layer network has a lower training

error and also a lower test set error than for example a 56 layer model.

So we cannot just increase the layers and layers and layers and hope that things get better.

And this effect is not just caused by overfitting.

We are building layers on top, so there must be other reasons and it's likely that it's

reasons that are related to the vanishing gradient problem.

Maybe one reason could be the relu's or the initialization or the problem of the internal

covariate shift, where we try then batch normalization, elu's and celu's.

But we still have a problem with the poor propagation of activations and gradients.

And we see that if we try to build those very deep models, we get problems with vanishing

gradients and we can't train the early layers, which even results in worse results on the

training set.

So I have one solution for you and these are residual units.

Residual units are a very cool idea.

So what they propose to do is not learn the direct mapping f of x, but instead we learn

the residual mapping.

So we want to learn h of x and h of x is the difference between f of x and x.

So we could also express it in a different way and this is actually then how it's implemented,

that you compute your network f of x as some layer h of x plus x.

So the trainable part is now essentially in a side branch and the side branch is h of

x, that's the trainable one, and on the main branch we have just some x plus the side branch

that will deliver our estimate y.

In the original implementation of residual blocks, there was still a difference.

It was not exactly like we had it on the previous slide.

So we had the side branch where we have a waiting layer, batch normalization, relu,

weights, another batch normalization, then the addition and then another non-linearity,

the relu for one residual block.

And this was later then changed into using batch norm, relu weight, batch norm, relu

weight for the residual block and it turned out that this kind of configuration was more

stable and we essentially have the identity that is back propagated on the plus branch.

So this is very nice because we can then propagate the gradient back into the early layers just

with this addition and we get a much more stable back propagation this way.

This then brings us to a complete residual network.

So we cut it at the bottom and show the bottom part on the right hand side and this is a

comparison between VGG, the 43 layer plane network and the 43 layer residual network

and you can see that there's essentially these skip connections that are being introduced

that allow us to skip over this step or then back propagate into their respective layers.

There's also down sampling involved and then of course the skip connections also have to

be down sampled.

So this is why we have dotted connections at these positions and we can see that VGG

has some 19.6 billion flops while our plane ResNet has only 3.6 billion flops.

So it's also more efficient.

Now let's put this to the test and we can now see that already in the 34 layer case

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:12:49 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 18:36:19

Sprache

en-US

Deep Learning - Architectures Part 3

This video discusses the ideas of residual connections in deep networks that allow going from 20 to more than 1000 layers.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen